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(57) A sign language telephone device is offered 
which enables an auralfy handicapped person who uses 
the sign language to converse with a normal person at a 
distant place who does not know the sign language. The 
sign language telephone device is placed on the side of 
the aurally handicapped person, and hand gestures of 
the sign language inputted from a sign language input 
means are recognized as the sign language, and the 
recognized sign language is translated to Japanese. 
The translated Japanese word train is converted to syn- 
thesized voices and it is transmitted to a videophone on 
the side of a normal person. The voices from the video- 
phone are recognized, and the recognized Japanese is 
translated to the sign language to generate sign lan- 
guage animations and they are displayed on the screen 
of a TV set on the side of the auraJly handicapped per- 
son. According to the present invention, it is made pos- 
sible for an aurally handicapped person to have 
conversation easily with a normal person at a distant 
place who does not know the sign language through an 
existing network. 
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Description 
Technical Field 

The present invention relates to a sign language 
telephone device to be used in a case where an aurally 
handicapped person talks with a normal person in a dis- 
tant place who does not know the sign language. 

Background Art 

The sign language has been developed to contrive 
the conrnurication between aurally handicapped per- 
sons. By using the sign language, an aurally hancfi- 
capped person is able to converse directly with another 
aurally hand capped person being close to fcfm or her 
with hand gestures, body gestures, face ^pressions, 
etc. In a case of the cc<rwnunicatk>n between aurally 
handicapped persons being apart from each other, the 
transmission of wiR was possite in realtime by perform- 
ing sign language gestures using videophone devices. 

On the other hand, recently, researches on sign 
language translation system have been actively per- 
formed so that an aurally handicapped person who uses 
sign language is able to converse with a normal person 
who does not know the sign language (Reference: 
Masaru Old. Hirohiko Sagawa, Tomoto Sakiyama, Eiji 
Ohira, Hirornichi Fupsawa: Information Processing 
Media Research Society, 15-6, Information Processing 
Society of Japan, 1994). The sign language translation 
system is composed of a sign-language-to-Japanese- 
translation -subsystem and a Japanese-to-sign-lan- 
guage-translation-subsystem. 

(1) The sigr^language-to-Japanese-translation- 
subsystem is composed of a sign language recog- 
nition unit which recognizes the sign language and 
translates it to a sign language word train, and a 
sign-language-to-Japanese-translation-unit which 
translates the recognized sign language words to 
Japanese. In the sign language recognition unit the 
gestures of hands are inputted using a glove-based 
input, the input hand gesture is compared with a 
standard hand gesture and a sign language word 
which has the closest standard hand gesture is 
selected. The sign-language-to-Japanese-transla- 
tion-unit translates a sign language word train to 
Japanese using a correspondence table between 
sign language words and Japanese words and a 
conversion rule from a sign language sentence to a 
Japanese sentence. 

(2) Japanese-to-sign-language-translation-subsys- 
tem is composed of Japanese to the sign language 
translation unit which translates Japanese to the 
sign language, and a sign language generation unit 
which displays the sign language as an animation 
using 3 dimensional computer graphics. The Japa- 
nese-to-sign-language-translation-unit analyzes 



Japanese and translates Japanese to a sign lan- 
guage word train using a correspondence table 
between Japanese words and the sign language 
words and a conversion rule from Japanese sen- 

5 fences to sign language sentences. The sign lan- 
guage generation unit generates sign language 
animations using a (sign language words)-(anima- 
tkxi data) dictionary which stores sets of indexes of 
sign language words and the corresponding data of 

w gestures of hands or countenances which are reg- 
istered beforehand. In the generation of a sign lan- 
guage animation, the sign language animation data 
corresponding to the sign language words in a sign 
language word train are retrieved, and a human 

15 body model moves based upon the retrieved data. 
The movement of the model is made to be seen 
continuous by rrterpolating the gaps between the 
sign language words. 

20 However, the sign language translation system is 
basically developed for the direct cornrnunication 
between an aurally handicapped person and a normal 
person being close to each other, so that it is not shown 
how to simply apply the configuration for a long distance 

25 call (conversation). 

If the conventional sipji language translation sys- 
tem is enlarged to apply to a long distance call, several 
controversial points will be produced. 

In the first place, there win be a problem which 

30 makes the configuration of a device a large scaled and 
complicated one. To begin with, the above-mentioned 
sign language translation system is supposed to be a 
stand-alone type system, and in a case where it is 
enlarged to be applied to a long distance call, as an 

35 ordinary form, the following form can be considered: the 
sigrvlanguag^tr>Japanese-trartsla^ and 
the Japanese-to-sign- language-translation-subsystem 
are separately composed and these systems are con- 
nected to each other through a network. 

40 However, in the case of the sign-Ianguage-tonJapa- 
nese-translation-subsystem and the Japanese-to-sign- 
language-translation-subsystem in a conventional sign 
language translation system, the dictionary data base 
or the correspondence table between the sign language 

45 words and Japanese words in the sign-language-to- 
Japanese-transtation-unit (Japanese-to-sign-language- 
translation-unit) are commonly used in order to econo- 
mize in the storage capacity. 

For example, for the sake of long distance calls, if 

so the sign-language-to-Japanese-translation-subsystem 
and the Japanese-to-sign-language-translation-subsys- 
tem are made to be separated and independent from 
each other and the sign-language-to-Japanese transla- 
tion-subsystem is provided on the side of an aurally 

55 handicapped person and the Japanese-to-sign-lan- 
guage-translation-subsystem is provided on the side of 
a normal person, then the identical data for translation 
have to be provided in duplication, which will naturally 
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make the device configuration a targe seated and com- 
plicated one. 

In the second place, there is another problem in 
that it is difficult to use an existing network for long dis- 
tance calls (conversation). In a case where the sigrHan- 
guage^to-Japanese-translation-subsystem is provided 
on the aurally handicapped person side and the Japa- 
nese-to-sign-lariguage-translatiori-subsystem is pro- 
vided on the normal person side, rt is necessary to 
transmit translated Japanese sentences or sign lan- 
guage animations to the other subsystem with each 
other. In particular, the transmission of sign language 
animations accompanies the transmission of a large 
quantity of images, so that for the execution of long cfis- 
tance calls enough preparations of the infrastructure of 
the network is needed, the network which is able to 
cope with the high speed transmission of a large capac- 
ity of data. Image transmission is possWe with the 
present videophone facilities: however, in the case of 
the sign language, unless the subtle form and move- 
ment of hands, etc. are accurately transmitted and dis- 
played, misunderstandings or erroneous recognition 
may be caused, which may give occasion to a trouble in 
communication. 

Therefore, up to now, for an aurally handicapped 
person who uses the sign language, there has been no 
means to have conversation easily with a normal person 
in a cfistant place who does not know the sign languaga 
Accordingly, they communicated to each other in trans- 
mitting characters or pictures using facsimile. Therefore, 
for an aurally handicapped person who wants to talk 
with the sign language, there have been some troubles 
to communicate with a normal person in a distant place 
who does not know the sign language. 

The purpose of the present invention is to offer a 
simple device with which an aurally handicapped per- 
son who uses the sign language is able to communicate 
with a normal person in a distant place who does not 
know the sign languaga 

Another purpose of the present invention is to offer 
a device which makes an aurally handicapped person 
who uses the sign language possible to communicate 
with a normal person in a distant place who does not 
know the sign language through an existing network. 

Disclosure of Invention 

The present invention proposes a new concept 
called a sign language telephone device. In short, the 
present invention allows an aurally handicapped person 
who uses the sign language to communicate with a nor- 
mal person in a distant place who does not know the 
sign language using the infrastructure of the existing 
videophone facilities. In the case of the present inven- 
tion, the videophone on the side of an aurally handi- 
capped person who uses the sign language is provided 
with both sign-language-to- Japanese-translation-func- 
tion and Japanese-to-sign-language-translation-func- 



tion and it is connected to the videophone on the side of 
a normal person through a network. 

In the present invention, a videophone device hav- 
ing a sign language translation function to be used by 

5 an aurally handicapped person (sign-language-to-Japa- 
nese-translation-function and Japanese-to-sign-lan- 
guage-translation-function) is called a sign language 
telephone device, and an ordinary videophone device 
used by a normal person is called a videophone device 

10 on the normal person side. The present invention 
makes it possWe to have conversation between a sign 
language telephone device and a videophone device on 
the side of a normal person in performing sign language 
translation. 

is The framework of the whole system according to 
the present invention is fundamerrtaiy constituted with 3 
elements, a sign language telephone device, a network 
and a videophone device; however, one of the features 
of the present invention is in that various functions are 

20 concentrated in the sign language telephone device. 

The sign language telephone device comprises 
several characteristic means such as a sign language 
input means, a videophone connection means, the sign- 
language-tr>Japanese>translation-subsystem, and the 

25 Japanese-to-sign-language-translation-subsystem, 
besides a TV set, camera, microphone, and videophone 
control device which are found in an ordinary video- 
phone device. 

Supposing a case where an aurally handicapped 

30 person actually calls a normal person in a distant place 
on a sign language telephone device, the fundamental 
operation of the present invention will be explained. 

An aurally handicapped person dials the telephone 
number of a normal person on the other end of the line, 

35 and when the normal person comes to the phone, the 
aurally handicapped person starts to communicate with 
him. In that case, the aurally handicapped person inputs 
the sign language through the sign language input 
means in the sign language telephone device, and the 

40 input sign language is recocpiized by the sign-language- 
to-Japanese-translation-subsystem and translated to a 
sign language word train and further translated to Japa- 
nese. The translated Japanese is outputted to the 
videophone on the side of a normal person through a 

45 videophone connection means and a network (public 
network) as a synthesized voice. On the videophone 
device on the side of a normal person, an actual image 
inputted by a camera in the sign language telephone 
device on the side of an aurally handicapped person is 

so displayed. In the case where voice is synthesized, cor- 
responding to the aurally hancfi capped person, the 
voice can be adjusted: man s voice or woman's voice, 
quality of voice, speed of speaking, loudness of voice, 
high voice or low voice, etc. can be selected. In the case 

55 of a female aurally handicapped person, naturally 
female voice is desirable as a synthesized voice. In the 
case of a young person, a high tone voice might be 
desirable. The tones of Japanese voice, which is the 
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result of translation of the sign language of an aurally 
handicapped person, can be used to specify an aurally 
handicapped person. 

On the side of a normal person, the response is 
given to a videophone device with voice, and the voice s 
transmitted through a network (a public network) a 
videophone connection device in a sign language tele- 
phone device is recognized in the Japanese-to-sign-lan- 
guage-translation-subsystem and the recognized 
Japanese is translated to the sign language, and the w 
translated sign language is expressed as a sign lan- 
guage animation and displayed on the TV set. 

The above-mentioned procedures are repeated, 
and the aurally handicapped person responses in the 
sign language, and the normal person basically is 
responses in voice. In the case where a normal person 
calls an aurally handicapped person on a videophone, 
the procedures are almost the same as the above-men- 
tioned case except the way of dialing at first 

In the case of the comrrmjnication using the sign 20 
language as described in the above, some more con- 
trivances are necessary. 

In a first place, in the sign-language-to-Japanese- 
translation -subsystem, it is made possible to select a 
translation mode or a non-translation mode. 25 

For an aurally handicapped person sign language 
is the means of communication, so that there is a fear 
that all hand gestures may be recognized as the ges- 
tures for corrvTiunication. While a sign language tele- 
phone device is being used, the movement of hands not 30 
included in the sign language, for example, the move- 
ment of a hand for drinking coffee, may be recognized 
as a gesture in the sign language. In contrast to this, in 
the translation mode in the present invention, the move- 
ment of hands are translated to the sign language, but 35 
in the non-translation mode the movement of hands is 
not translated. The methods of changeover between the 
translation mode and non-translation mode are shown 
below, 

40 

(1) A method performed with a button, 

(2) A method in which non-translation mode is 
selected when the face is not looking forward, 

(3) A method in which the translation mode and the 
non-translation mode can be changed over by per- 45 
forming a predetermined special hand gesture. 

(4) A method in which the non-translation mode is 
selected when at least a hand is placed at the home 
position, etc, can be considered. 

50 

In a second place, at the videophone on the side of 
a normal person, not only the actual image but also the 
animation can be displayed. When an aurally handi- 
capped person talks with a normal person whom the 
aurally handicapped person does not know well, in most 55 
cases the aurally handicapped person is reluctant to 
show his or her actual image. In particular, in the case 
of a female person, in many cases she feels resistance 



to show her actual image when the call is from a 
stranger. Therefore, the sigji-language-to-Japanese- 
translation-subsystem comprises a conversion means 
to convert the input hand gesture data to a sign lan- 
guage animation using the hand gestures inputted from 
the sicp language input means and the expressions of 
the face which is taken in from a camera and recog- 
nized. In the image mode, the actual image data from 
the camera are displayed and in the animation mode, 
the sign language animation is displayed for the protec- 
tion of privacy. 

In a third place, the display on the sign language 
telephone device on the side of an auraly handicapped 
person and the cisplay on the videophone on the side of 
a normal person are synchronized. It takes time to 
translate the sign language of an aurally hanrJcapped 
person to Japanese Thereby, there is probabirry that 
the actual image of an aurally handicapped person and 
the voice and character train of Japanese translated 
from the sign language of an aurally harxfcapped per- 
son are displayed discorrtinuousfy and asynchronously 
on the time axis on the screen of a videophone device 
on the side of a normal person. The present invention 
comprises a means to display in making them synchro- 
nized. 

It takes also time to recognize the spoken Japanese 
of a normal person, convert it to a character train, and 
convert it to a sign language animation. Thereby, there 
is probabflrty that the actual image of the normal person 
sent to the screen of the sign language telephone 
device on the side the aurally handicapped person and 
the displayed sign language animation obtained in 
translating the spoken Japanese of the normal person 
are displayed discontinuously and asynchronously on 
the time axis. The present invention comprises a means 
to display in making them synchronized. 

To be concrete, the actual image is given a time 
stamp and the time stamp is adjusted to a time stamp 
given to the translated and displayed image for synchro- 
nization. 

For example, in a case of direct conversation with- 
out using sign language telephone device, the periods 
of time needed are as shown below: 

0.0 sec to 2 0 sec [sign language] Good morning!, 
2 0 sec to 5.0 sec [sign language] How are you?. 
5.5 sec to 8.0 sec [voice] I'm fine. 

When the conversation of an aurally handicapped per- 
son by way of the sign language comes to a stop, the 
conversation with the voice of a normal person is 
started. Assuming that the translation in the sign lan- 
guage telephone device is started after the finish of con- 
versation, the result of translation of the sign language 
conversation of 0.0 sec to 2.0 sec is, for example, deliv- 
ered to the videophone as a synthesized voice during 
2.0 sec to 4.0 sec. 
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0.0 sec to 2.0 sec [sign language} Good morning! 
2.0 sec to 4.0 sec [synthesized voice] Good morn- 
ing! 

2.0 sec to 5.0 sec [sign language] How are you 

4.0 sec to 7.0 sec [synthesized voice] How are you? s 

7.0 sec to 10.0 sec [voice] I'm fina 

10.0 sec to 13.0 sec [sign language animation] I'm 

fine. 

If the actual image is transmitted to the videophone on w 
the side of a normal person without synchronizing, the 
gesture "Good rnorning!" is sent at first and at the time 
when the gesture "How are you" is performed the sign 
language "Good morning" is displayed as a synthesized 
voice being translated, and the actual image and the 15 
synthesized voice, a result of translation, are deviated in 
point of time from each other. It gives a receiver some 
incongruous feeling, so that it is desirable that the actual 
image and the synthesized voice, a result of translation, 
are synchronized, fan the present invention, the actual 20 
image and the conversation in the sign language are 
recorded together with time and the result of translation 
of sign language is given the time when the sign lan- 
guage is actually performed. In order to make the time 
coincide with time in real time, the actual image and the 25 
synthesized voice are synchronized and after that they 
are transmitted to the videophone device on the side of 
the normal person. About the actual image and voice 
sent from the side of a normal person, in the similar way, 
time is recorded. When the voice is recognized and so 
translated to the sign language, and displayed as an 
animation, the actual image and the sign language ani- 
mation are synchronized by utilizing time and then they 
are displayed. When they are synchronized, in some 
cases, the shortage of actual images or of time to dis- 35 
play occurs. In the present invention, when the actual 
images to display are not enough, a still picture at the 
time when the shortage is made clear is displayed. 
When the display time is not enough, rapid traverse of 
the actual image is performed or display is not dis- 40 
played. 

In a fourth place, as a means for a response mes- 
sage to a telephone call when no body is in, the present 
invention comprises a means to prepare a message in 
combining some selected out of voices, images, charac- 45 
ters or sign language animations. In this case too, as 
mentioned in the second feature, rt is an effective way to 
prepare response when no body is in using an anima- 
tion without using an actual image from the point of view 
of protecting privacy. 50 

In a fifth place, the present invention comprises a 
means to display characters, which are the results of 
recognition of the voice of a normal person, together 
with a character train which is obtained by translating 
the result of recognition of sign language into Japanese 55 
on the videophone on the side of a normal person. 
Thereby, on the side of a normal person, it is made pos- 
sible to confirm whether the contents of conversation 



spoken by him or her is correctly transmitted to a sign 
language telephone device or not 

The further objects or configuration win be made 
clear with the explanation about the embodiments 
shown in the following. 

Brief Description of Drawings 

Fig. 1 shows the block diagram of the hardware of a 

sign language telephone device showing an 

embodiment according to the present invention; 

Fig. 2 shows the system block diagram of the sign 

language telephone device; 

Fig. 3 shows the software block diagram of the sign- 

lai>guage-tCKJapanese-traiis^ 

Fig. 4 shows the software block oSagram of the Jap- 

anese-to-sigriTariQ^age-trans^ 

Fig. 5 shows the process flowchart of the sign-lan- 

guageto-Japanese4ranslation-con^ unit 

Fig. 6 shows the process flowchart of the gesture to 

sign language animation conversion unit; 

Fig. 7 shows the process flowchart of the image 

generation unit; 

Fig. 8 shows a screen example of the sign lan- 
guage standard mode of a sign language telephone 
device; 

Fig. 9 shows a screen of transmission example of a 
videophone device; 

Fig. 10 shows an illustrative drawing of a synchro- 
nizing method between the sign language and 
translation results, or between the voice and recog- 
nition results; 

Fig. 1 1 shows the process flowchart of the image, 
voice and character synchronizing unit of the Japa- 
riese-to-sign-language-translation-subsystem; 
Fig. 12 shows the process flowchart of the image, 
voice and character syrxfronizing unit in the sign- 
larKjuage-to-JapanesetiBnsfatx>rhsubsystem; 
Fig. 13 shows the process flowchart of display of 
the voice recognition unit; 

Fig. 14 shows the process flowchart of an answer 
phone control unit; 

Fig. 15 shows the response process flowchart of a 
response massage of an answer phone; 
Fig. 16 shows a screen example in the sign lan- 
guage enlargement mode of a sign language tele- 
phone device; 

Fig. 17 shows a screen example in the sign lan- 
guage animation mode of a sign language tele- 
phone device; 

Rg. 1 8 shows the screen display flowchart of a sign 
language telephone device; 
Fig 19 shows the setting process flowchart of a 
synthesized voice; 

Rg. 20 shows the block diagram of a sign language 
telephone device being provided with an interpreta- 
tion server; 

Rg. 21 shows a display screen of a response mes- 
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sage in an ordinary mode of a videophone device, 
and Fig. 22 shows a display screen of a response 
message in a sign language mode of a videophone 
device. 

Best Mode for Carrying Out the Invention 

In the following, the present invention will be 
explained referring to drawings. Fig. 1 shows the hard- 
ware block diagram of a sign language telephone 
device, and Fig. 2 shows its system block diagram. Fig. 
3 and Fig. 4 show software module block diagrams of a 
sign language telephone device. 

In Fig. 1, a TV set 104. camera 102. microphone 
122, speaker 126 and mode switch 130 are connected 
to the sign language telephone device 110. Hand ges- 
tures performed by an aurally handicapped person are 
inputted to the sign language telephone control device 
using a special glove (for example. Data Glove (the 
trade mark of VPL Research Inc.): a device for inputting 
the shape or position of hands) 1 14. The sign language 
telephone control device 110 generates voices and 
images to communicate with a videophone on the side 
of a normal person and send them to the videophone 
control device 1 18. The videophone control device 118 
regards images and voices sent from the sign language 
telephone control device 1 10 as the images and voices 
sent from a camera or microphone, and sends them to 
a videophone device on the side of a normal person 
through a telephone line 120 which forms a network 
(public network). The images and voices sent from the 
videophone on the side of a normal person are received 
by the videophone control device 1 18 through the tele- 
phone line 120 and they are sent to the sign language 
telephone device 110. The mode switch 130 sets the 
translation mode of the sign language and the picture 
mode to be displayed on a videophone device on the 
side of a normal person. The videophone on the side of 
a normal person is an ordinary videophone having no 
sign language translation function and being provided 
with a control table (or a keyboard) equipped with a TV 
set camera, microphone, speaker and a group of 
switches, and comprises a videophone control device. 

The sign language telephone control device 110 
comprises the functions of translating the sign language 
to Japanese (the sign^riguage- to-Japanese- transla- 
tion-subsystem) shown in the following among the basic 
sign language functions. 

(1) A function to recognize the sign language per- 
formed by an aurally handicapped person, to trans- 
late it to Japanese and to convert them to a 
character train. 

(2) A function to convert the translated Japanese 
character train into synthesized voices. 

(3) A function to display the translated Japanese 
character train on the TV set 104 for confirming 
whether the sign language performed by an aurally 



handicapped person is correctly translated or not, 
or to translate the translated Japanese character 
train to the sign language and to display it as an 
sign language animation on the TV set 104. 

5 

The sign language telephone control device 110 
comprises functions as shown in the following as the 
functions to translate Japanese to the sign language 
(Japanese to the sign language translation subsystem) 
w among the basic sign language telephone functions. 

(1) A function to recognize the voice spoken by a 
normal person and convert it to a Japanese charac- 
ter train. 

is (2) A function to translate the converted character 
train to the sign language and display it on the TV 
set 104 as a sign language animation. 
(3) A function to compose the recognized Japanese 
character train with the image inputted from the 
20 camera 102 and transmit it to the videophone on 
the side of a normal person for the confirmation 
whether the voice spoken by him or her is correctly 
recognized or not. 

25 The sign language telephone control device 110 
comprises functions as shown in the following besides 
those descrbed in the above. 

(1) A function to be able to set a mode in which 
30 hand gestures are not recognized as the sign lan- 
guage. When a person wants to drink coffer while 
talking on the phone, the function prevents the hand 
gestures to be recognized as the sign language. 
The function can be also used when aurally handi- 

35 capped persons consult to each other about a mat- 
ter which they do not want to be known by a person 
at the end of the line. 

(2) A function which makes it possible to display the 
data of hand gestures as they are and display as 

40 the animation in place of the actual image on the 
side of an aurally handicapped person. The func- 
tion can be used when a person to talk with is an 
aurally handicapped person and when rt is desira- 
ble not to show the actual image to him or her. 
45 (3) A function which allows the following operation: 
the result of translation from the sign language to 
Japanese or from Japanese to the sign language 
being synchronized with the actual image is output- 
ted to the videophone device on the side of a nor- 
50 mal person or the TV set on the side of an aurally 
handicapped person. It takes time to translate from 
the sign language to Japanese or from Japanese to 
the sign language, and if the actual image is trans- 
ferred without time delay, the actual image and the 
55 translation result become deviated from each other 
in point of time, which makes it difficult to under- 
stand the contents of what is talked by a person at 
the other end of the line. The function is able to pre- 
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vent the difficulty in understanding. 

(4) A function which makes it possfcte to display the 
sign language as an animation in place of an actual 
image besides the display of voices or characters 
when a response message of an answer phone is 
generated. 

(5) A function which makes it possfcle to display 
characters, which are a result of recognition of the 
voice of a normal person, together with a character 
train which is obtained in translating the result of 
recognition of the sign language into Japanese on a 
videophone device on the side of a normal person. 

Fig. 2 shows a block diagram of the hardware sys- 
tem of the sign language telephone control device 1 10. 
The control of the sign language telephone device is 
executed by a CPU 210. The program or data are stored 
in a magnetic disc including a program memory area or 
mode data memory area, and when control is executed 
they are loaded to a memory 222. The operation of the 
sign-tanguag^-to-Japafiese-tr^^ 
which translates the sign language to Japanese and the 
Japanese to the sign language translation subsystem 
which translates Japanese to the sign language are 
executed under the control of the CPU 210 while the 
program memory area in the magnetic disc 202 is 
loaded in the memory 222. (About the software configu- 
ration of the sign-langjjageto^Japanese-tr^^ 
system and Japanese to the sign language translation 
subsystem will be explained referring to Fig. 3 and Fig. 
4.) 

The exchange of image data or voice data are per- 
formed through the videophone connection unit 216 
between the sign language telephone control device 
1 10 and the videophone control device 1 18. Hand ges- 
tures are inputted from the glove-based input 114 
through a hand gesture input unit 21 a Actual images 
from the camera are inputted from the video input unit 
212. From the image output unit 204 image data are 
outputted to the TV set 104. Voices are taken in from the 
microphone 122 using the voice input unit 206. Voices 
are outputted from an voice output unit 214 through the 
speaker 126. Hand gestures or actual images are held 
in a memory 222 for a predetermined period of time for 
the processes of recognition, eta Pictures, etc. to be 
output to the television are displayed using the data 
stored once in a memory. The setting of modes such as 
a translation mode or non-translation mode is partly 
performed through a mode data setting unit 220. The 
setting of a mode using the mode setting unit is per- 
formed using the mode switch 130 fixed on the sign lan- 
guage telephone device. 

Next, the software configuration of the sign^an- 
guage-to-Japanese-translation-subsystem which trans- 
lates the sign language expressed by an aurally 
handicapped person into Japanese will be explained. 

The sign language hand gestures are inputted from 
the glove-based input 1 14 (for example Data Glove, the 



trade mark of VPL Research, toe.: a device for inputting 
the shape of hands or their positions) through the hand 
gesture input unit 218. The expressions on the face or 
the like are inputted from the camera 102 through the 

5 video input unit 212 shown in Fig. 2. The hand gestures 
in the sign language inputted from the hand gesture 
input unit 218 are recognized in a hand gesture recogni- 
tion unit 310. The position of the face or the expressions 
on the face are recognized in an image recognition unit 

10 312. The recognition result performed to the hand ges- 
ture recognition unit 310 and that in the image recogni- 
tion unit 312 are integrated and recognized as the sign 
language in the sign language integrated recognition 
unit 320. About the recognition methods, description is 

is given in Japanese Patent Laid-open Appfication No. Hei 
6-253457 (Sign Language Recognition Device) or in the 
paper (Hirohiko Sagawa, Hiroshi Sako, Masahiro Abe: 
Sign Language Interpretation System Using Continu- 
ous DP Matching, Human Interface Research Society, 

20 Information Processing Society of Japan, 44-12. 1992). 
The recognized sign language is controlled whether 
it is to be translated to Japanese in the sign-language- 
to-Japanese-translation-control unit 324 or not 
Whether it is to be translated or not is decided according 

25 to the mode, that is, the translation mode or non-trans- 
lation mode. The translation mode is a mode in which 
sign language is to be translated to Japanese, and the 
non-translation mode is a mode in which sign language 
is not to be translated to Japanese. The changeover of 

30 modes between the translation mode and the non- 
translation mode can be performed with a manual oper- 
ation for mode changeover, an automatic changeover to 
the non-translation mode when a hand is positioned at 
the home position (on the knee), or the mode setting 

35 with a switch equipped in the mode data setting unit 220 
shown in Rg. 2, etc. The translation mode data used in 
the sign-language-to^Japanese-translat^ unit 
324 are stored in a translation mode data memory unit 
380. In the translation mode, the sioji language recog- 

40 nized in the sign language integrated recognition unit 
320 is transferred to a sign-language-to^Japanese- 
translation-unrt 334. About the sign-language-to-Japa- 
nese-translation-method, it is made to be known to the 
public by the paper (Masahiro Abe, Hiroshi Sako, Hiro- 

45 hiko Sagawa: Sign Language to Sentence Conversion 
Method Based on Sentence Structure Meaning Analy- 
sis, the Institute of Electronics and Information Commu- 
nication Engineers of Japan, Vol. J76-D-11, No.9, pp. 
2023-2030, 1993). When translation in the sign-lan- 

50 guage-to-Japanese-translation-unit 334 is found to be 
irnpossible, a translation-irnpossible-indication genera- 
tion unit 338 displays that the translation was impossible 
on the TV set 104, a display of a sign language tele- 
phone device. For the confirmation if the sign language 

55 is correctly translated, a sign language animation gen- 
eration unit 344 generates an animation for displaying 
the recognized sign language on the sign language tel- 
ephone device. A sign language animation method is 
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made to be known to the public by the paper (Tomoko 
Sakiyama. Eiji Ohira, HirohikD Sagawa, Masahiro Abe, 
Kiyoshi Arai: Study on Sign Language Generation 
Method by Animation, No. 46 All Japan Meeting of the 
Information Processing Society erf Japan, 8P-4, 1993). 
The Japanese translated in the sign-language-to-Japa- 
nese-translation-unit 334 is converted to the voice in 
Japanese in a voice synthesizing unit 350. The voice in 
Japanese or Japanese which is a result of translation in 
the sigrhlarigijage-to-Japanese-translatio^unit is sent 
to an image, voice and character synchronizing unit 356 
in order to generate images to transmit to the video- 
phone device on the side of a normal person. 

About the image to be sent to the videophone 
device on the side of a normal person, there are 2 
cases: one is the case where an actual image (an image 
remaining the face, etc. of an aurally handicapped per- 
son obtained from a camera) is used, and the other one 
is the case where an animation, not an actual image, is 
used. In the case where the actual image is used, an 
image obtained from a camera through the video input 
unit 212 is used. On the other hand, in the case where 
an animation is used, the animation is generated in the 
gesture to sign language animation conversion unit 328 
using untouched hand gesture data obtained from the 
glove-based input 1 14 through the hand gesture input 
unit 218 and the sign language gesture recognition unit 
310. The image generation unit 332 controls whether an 
actual image is to be sent to the videophone device on 
the side of a normal person or an animation is to be sent 
there. The image, voice and character synchronizing 
unit 356 synchronizes an image generated in the image 
generation unit 332, a character train generated in the 
sign-language-to-xlapanese-translation unit 334 and a 
synthesized voice generated in the voice synthesizing 
unit 350 and generates a picture to be sent to a video- 
phone device on the side of a normal person. 

The image, character train and synthesized voice 
generated in the image, voice and character synchro- 
nizing unit 356 are transferred to an answer phone con- 
trol unit 366. If they are not in the response message 
register mode of an answer phone, the image, character 
train and synthesized voice generated in the image, 
voice and character synchronizing unit 356 are trans- 
ferred to the videophone picture generation unit 360. In 
the videophone picture generation unit 360, a character 
train obtained from the translation of the transferred 
images, sign language, etc. to Japanese, an output 430 
which is the result of recognition of the voice from the 
videophone device on the side of a normal person (430: 
the output of a display-data-for -voice-recognition -confir- 
mation generation unit 428 in the Japanese-to-sign-lan- 
guage-translation-subsystem on the side of a normal 
person (to be explained later)), and an error message 
424 to be issued when the translation to the sign lan- 
guage is not possible (424: the output of the translation- 
impossible-indication generation unit 420 in the Japa- 
nese-to-sign-language-translatiorHsubsystem on the 



side of a normal person (to be explained later)) are inte- 
grated and the picture to be transmitted to a videophone 
device on the side of a normal person is generated. The 
generated picture and voice are combined andthecom- 

5 bined picture and voice are transmitted from a television 
picture and voice transmitting unit 382 to a videophone 
device on the side of a normal person through the 
videophone connection unit 216, videophone control 
device 1 18 and the telephone line network 120. 

w The description in the above is the explanation of 
the sign-language-tr>Japaiiese-tran^ 
now following to it the function of an answer phone in 
the sign language telephone device wil be explained, fan 
the case of an answer phone in a sign language tele- 
is phone device, some different functions from those of an 
ordinary telephone are needed. In the case of a sign 
language telephone device, there is a probability that an 
aurally hanrJcapped person, besides a normal person, 
uses the telephone. When an aurally handicapped per- 

20 son called on the telephone is at home, there wil be an 
ordinary telephone conversation between aurally hand- 
icapped persons, so that the communication between 
them is of course posstole when the device is operated 
as an ordinary videophone device. If there is a tele- 

26 phone call from an aurally handicapped person, other 
than a normal person, while the called aurally handi- 
capped person is out, a response with voice (response 
while nobody is in) does not work. Therefore a response 
message in the sign language or characters, other than 

30 voice, becomes necessary as a function of an answer 
phone. However, many people do not want to send an 
actual image when a response in the sign language is 
performed. It is because that when there is a telephone 
call from a stranger while one is out it is not desirable to 

35 use an actual image of the sign language as a response 
message from the point of view of protection of privacy. 
In such a case, a response message is generated with 
an animation generated in the gesture to sign language 
animation conversion unit 328. To do this, before the 

40 response message is generated, the sign language ani- 
mation mode shall be selected as a screen mode. 

rf it is not the mode to generate a response mes- 
sage for an answer phone, the answer phone control 
unit 366 transfers images or voices generated in the 

45 image, voice and character synchronizing unit 356 to 
the videophone picture generation unit 360. If it is the 
mode to generate a response message for an answer 
phone, images or voices generated in the image, voice 
and character synchronizing unit 356 are stored in an 

so answer phone data memory unit 378. When there is a 
telephone call during the answer phone mode, the 
response for it will be made taking out a response mes- 
sage from the answer phone data memory unit 378. 
Further, the system comprises such configuration as to 

55 allow a normal person to generate a response message 
for an answer phone in inputting an output 454 from an 
image, voice and character synchronizing unit 442 in 
the Japanese-to-sign-language-translation-subsystem 
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(to be explained later) to the answer phone control unit 
366. The message inputted by a normal person is also 
input to the answer phone data memory unit 378, which 
makes it possfcle to read it later in translating it to the 
sign language using the Japanese-to-sign-language- 
translation-subsystem. 

Next the software configuration of the Japanese- 
to^rHanguage-translation-subsystem which trans- 
lates Japanese spoken by a normal person to the sign 
language will be explained referring to Fig. 4. 

The voice spoken by a normal person and the 
image are transmitted to the sign language telephone 
device on the side of an aurally handicapped person 
through the telephone line 120. The voice is received by 
a videophone voice data receiving unit 406, the voice 
which is sent from the videophone connection unit 216 
shewn in Fig. 2 by way of the videophone control device 
118. The image is received by a videophone image data 
receiving unit 402, the image which is sent from the 
videophone connection unit 216 shown in Fig. 2 by way 
of the videophone control device 118. The voice 
received by the videophone voice data receiving unit 
406 is investigated by an input changeover unit 458 if rt 
is in a mode to generate a response message for an 
answer phone, and if it is not to generate a response 
message for an answer phone, it is converted to a Jap- 
anese character train in the voice reaxpirtion unit 412. 
The recognized Japanese is translated to the sign lan- 
guage in the Japanese-to-sigrhlangiiage-translalkxi- 
unit 416. The Japanese-to^gn-language-translation- 
method is made to be known to the public by the paper: 
(Eiji Ohira, Tomoko Sakiyama, Masahiro Abe, Hirohiko 
Sagawa: Basic Study of Sign Language Generation 
System, No. 46 All Japan Meeting of the Information 
Processing Society of Japan, 8p-3, 1993). In that case, 
if translation is impossible, a message telling that trans- 
lation was impossible is generated in the translation- 
impossfcle-inchcation generation unit 420. In order to 
confirm if the voice recognition is correctly performed, 
the recognized Japanese is displayed as a character 
train in the display^iata-for-voice-recc^nrtion confirma- 
tion generation unit 428. If it is a mode in which a normal 
person generates a response message, voice is not 
inputted from the videophone voice data receiving unit 
406. but it is inputted from the voice input unit 206 
shown in Fig. 2 by way of the microphone 122. The input 
changeover in the input changeover unit 458 is per- 
formed by the output 386 from the answer phone control 
unit 366 in the above-mentioned sign-language-to Jap- 
anese- translation-subsystem. 

The translated sign language is converted into an 
animation in the sign language to animation generation 
unit 438. The sign language animation generated in the 
sign language generation unit 438. the image from the 
videophone image data receiving unit 402 transmitted 
from a normal person, the characters transmitted from 
the Japanese to the sign language translation unit 416. 
and the voice from the videophone voice data receiving 



unit 406 sent from a normal person are synchronized in 
the image, voice and character synchronizing unit 442. 
The image, character and voice synchronized in the 
image, voice and character synchronizing unit 442 are 

5 sent to the videophone picture generation unit 446. The 
image, character and voice sent from the image, voice 
and character synchronizing unit 442, the character 
train 336 translated to Japanese from the sign-Ian- 
guage-to^Japanese-translatfon-unit 334 in the above- 

70 mentioned sign-lar^uage-to-Japanese-translatkx>^ub- 
system, an output 346 from the sign language anima- 
tion generation unit 344, and the error message 340 
from the translation imposstole-indication generation 
unit 338 are exxnbined to generate a videophone picture 

75 in the videophone picture generation unit 446, and it is 
sent to the TV set 104 from the television picture and 
voice transmitting unit 450 through the image output 
unit 204 shown in Fig. 2 and displayed on a screen of 
the TV set 104, or it is sent to the speaker 126 through 

20 the voice output unit 214 shown in Fig. 2 and output as 
voice. 

The process of the sign-language-to-Japanese- 
translation-control unit 324 in the stgn-language-tr>Jar> 
anese-transiation -subsystem explained in Fig. 3 wifl be 

25 explained referring to Fig. 5. The sign-Janguage-to-Jap- 
anese-translation -control unit 324 controls the selection 
of the mode, either the translation mode or the non- 
translation mode In the case of voice, if one does not 
speak, it is not conveyed to a person at the other end of 

30 the line, but in the case of the sign language, if a hand 
is moved, it can be regarded as an expression of the 
sign language. Therefore, the sign- language-to-Japa- 
nese-translation-control unit 324 prevents the move of a 
hand such as scratching the head or stretching a hand 

35 for drinking coffee to be recognized as an expression of 
the sign language by changing over a mode 

There are several ways to change over a mode in 
the sign-language-toJapanese-transfe^ unit 
324. A hand move for the translation mode and another 

40 hand move for the non-translation mode are registered 
beforehand to the sign language integrated recognition 
unit 320 as the special sign language. It is detected 
whether the telephone is cut or not(502), and if the tele- 
phone is cut. process is terminated. When the tele- 

45 phone is not cut and if the gesture of the translation 
mode is recognized (506). the translation mode is set 
(508). If the gesture of non-translation mode is recog- 
nized (510), non-translation mode is set (512). When 
the position of a hand is recognized to be at the home 

so position (514), the sign language or a hand move per- 
formed by another hand is not translated to Japanese. 
When the face is not looking forward (516). or when the 
body is not facing forward (520). the recognition results 
at that time are not translated to Japanese. In the case 

55 of the other sign language, translation mode data are 
taken in (522), and it is investigated if they are in the 
translation mode (524). and rf rt is so, the recognition 
result is transferred to the sign-language-to-Japanese- 
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translation-unit 334 and it is translated to Japanese 
(526). If it is not. the data are abandoned. 

The process of the gesture to sign language anima- 
tion conversion unit 328 in the sign-tanguage-to-Japa- 
nese-transtatkxi-subsystem explained in Fig. 3 will be 
explained referring to Fig. 6. If it is not the animation 
mode (602). nothing will be dona An image to be sent 
to the side of a normal person will be an actual image. If 
it is the animation mode (602). the hand position data 
inputted from the hand gesture input device unit are 
made the data for animation hand gestures (604) . When 
a glove-based input is used, the data concerning the 
position or direction of hands or the inclination of fingers 
are inputted every 1/30 sec. Using these data, an ani- 
mation is generated by cfisptaying an image for anima- 
tion every 1730 sec. This method is the same as the 
method used in generating an animation in the sign lan- 
guage animation generation unit 344. The difference 
between them is that in the sign language animation 
generation unit 344, the data concerning the position of 
hands and the inclination of fingers corresponding to the 
sign language word code are used and in the gesture to 
sign language animation conversion unit 328 the live 
data of hand gestures are used, tf the recognition of the 
expressions on the face is made possfole by the image 
recognition, it is also made possible to give expressions 
on the face in the animation using the recognition result 

Next the process in the image generation unit 332 
in the sign-fan^iage-to-Japanese-trarislation-subsys- 
tern explained in Fig. 3 wiR be explained referring to Fig. 
7. Before that, the screen of the sign language tele- 
phone device win be explained referring to Fig. 8. and 
the screen of the videophone device on the side of a 
normal person while he is talking toward the sign lan- 
guage telephone device will be explained referring to 
Fig. 9. On the screen 1600 of the sign language tele- 
phone device shown in Fig. 8, the following are dis- 
played. 

(1) Actual image display unit (1602): an actual 
image taken by a camera of the videophone on the 
side of a normal person is displayed. 

(2) Sign language animation display unit (1606): the 
voice of a normal person is translated to a sign lan- 
guage and displayed as an animation. 

(3) Character display unit (1604): the voice of a nor- 
mal person is recognized and displayed as charac- 
ters. 

(4) Sign-language-anirr«tion-for-c»rrfirmation dis- 
play unit (1608): the sign language expressed by an 
aurally handicapped person is displayed in an ani- 
mation for confirmation. 

(5) Charader-for-confirmation display unit (1610): 
the result of translation of the sign language 
expressed by an aurally handicapped person to 
Japanese is displayed by characters for confirma- 
tion. 



On the other hand, following are displayed on the 
screen 1700 of the videophone on the side of a normal 
person shown in Fig. 9. 

5 (1) Actual image dsplay unit (1702): an actual 
image taken by a camera of a sign language tele- 
phone device is displayed. 

(2) Character display unit (1 704): the result of trans- 
lation of a sign language to Japanese is olsplayed 

w in characters. 

(3) Characters for confirmation display unit (1706): 
the result of recognition of the voice of a normal 
person is olsplayed in characters, 

75 The characters display unit or the characters for 
conf irmatfon display unit are generated on the side of a 
sign language telephone device. Further, it is possible 
to convey the result of translation of a sign language by 
an aurally handicapped person to Japanese with voice. 

20 Now coming back to Fig. 7. the process in the 
image generation unit 332 shown in Fig. 3 will be 
explained. There are 3 kinds of modes in the display 
screen modes of the videophone on the side of a nor- 
mal person. 

25 

(1) Normal mode: an actual image is displayed. 

(2) Animation mode: an animation converted in the 
gesture to sign language animation conversion unit 
is used. 

30 (3) Suppression mode: an image which is regis- 
tered before hand is displayed. 

Another method in the suppression mode is to dis- 
play a still picture which is registered beforehand in a 

35 sign language telephone device. 

In the image generation unit 332, it is judged if it is 
an animation mode (702). If it is so, an animation gener- 
ated using the data generated in the gesture to sign lan- 
guage animation conversion unit 328, is used as data to 

40 be displayed on the actual image display unit 1 702 of a 
videophone device on the side of a normal person 
(704). If it is no* an animation mode, it is investigated if 
it is a suppression mode (706). If it is a suppression 
mode, the image data which have been registered 

45 beforehand are used as data to be displayed in the 
actual image Display unit 1702 in a videophone device 
on the side of a normal person (708). If it is not a sup- 
pression mode, it should be regarded as an ordinary 
mode, so that the image data will be used as data to be 

so displayed on the actual image display unit 1702 of a 
videophone device on the side of a normal person 
(710). 

Next, the process in the image, voice and character 
synchronizing unit 356 in the sign-language-to-Japa- 
55 nese-translation-subsystem, and the process in the 
image, voice and character synchronizing unit 442 in 
the Japanese-to-sign-language-translation-subsystem 
will be explained referring to Fig. 10. There is only a dif- 
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ference in translation results between these processes. 

It takes time to translate the result of the sign lan- 
guage recognition to Japanese and display as voice 
because of the reasons that before one word of the sign 
language is completed, the translation of it to Japanese 
is impossHe, etc Therefore, in order to synchronize the 
actual image of an aurally handicapped person perform- 
ing the sign language and the voice obtained by trans- 
lating the sign language to Japanese and transmit them 
to a videophone on the side of a normal person, it is 
necessary to delay the transmission of the actual image 
for the period of time needed for the process in which 
the result of the sign language recognition is translated 
to Japanese and converted to voice. In order to syn- 
chronize the actual image and the result of recognition, 
the time when the actual image, sign language or the 
voice is inputted is recorded beforehand. The basic idea 
of synchronization is that the actual image is temporarily 
stored and when the sign language translated to Japa- 
nese or the voice translated to the sign laiiguage anima- 
tion is oulputted, the actual image is displayed so that it 
coincide with the time given to the original sign lan- 
guage or voice. 

A method for synchronizing a translation result and 
an actual image will be explained referring to Fig. 10. In 
the figure, from 1802 to 1812 is the period of time in 
which the sign language is performed by an aurally 
handicapped person, from 1814 to 1828 is the period of 
time in which the actual image and the synthesized 
voice of translated Japanese are being transferred to 
the videophone device on the side of a normal person 
being synchronized, from 1842 to 1850 is the period of 
time in which the normal person is speaking Japanese, 
from 1830 to 1840 is the period of time in which voice 
recognition is performed in the sign language telephone 
device, and the sign language animation obtained in 
translating the recognized characters to the sign lan- 
guage and the screen of the videophone on the side of 
the normal person are displayed being synchronized. A 
case where a telephone call is performed from the sign 
language telephone device to an ordinary videophone 
will be considered. The time 1802 is the time when the 
sign language telephone device is connected to the 
videophone on the side of the normal person and the 
face of a person on the other end of the line is being 
confirmed. And then the aurally handicapped person 
talks to the normal person in the sign language "Good 
morning!" "How are you?" The periods of time, 1804 
and 1806 are the talking time. In the sign language tele- 
phone device, for the time of 1804, when the aurally 
handicapped person expresses "Good morning!" in the 
sign language, translation of it is performed. After say- 
ing "Good morning!" in the sign language, the sign lan- 
guage is translated to Japanese voice, and for the time 
of 1818. "Good morning!" is expressed in voice and dis- 
played by characters together with the actual image for 
the time of 1804 in the videophone on the side of a nor- 
mal person. When "How are you?" is expressed for the 



time of 1806 in the sign language, for the time of 1822, 
"How are you? 1 is expressed with voice and displayed 
with characters together with the actual image for the 
time of 1806. The normal person, looking at the sign 

5 language at the time of 1818 and 1822, answers "I'm 
fine thank you" for the time of 1844 on the videophone 
on the side of the normal person. In the image and voice 
transmitted to the sign language telephone device for 
the time of 1844, the voice is translated to the sign Ian- 

10 guage animation. And the sign language animation 
expressing *Tm fine thank you." and the actual image for 
the time of 1844 are dsplayed on the sign language tel- 
ephone device. 

However, in the case where the sign language per- 

15 formed for the time of 1804 ts displayed for the time of 
1818, even though the actual image for the time of 1802 
is transmitted to the person at the other end of the ine 
for the time of 1814 in delaying a tttle, for the time of 
1816, it is impossible to recognize the sign language 

20 word until the sign language word is completed, so that 
there is no image to display for the time of 1816. The 
length of display of a sequence of words c5Kers from 
that of a word, so that there is no actual image to display 
for the time of 1820. The time of completion of transfe- 
rs tkxi of the voice "And you?" input for the time of 1846 to 
the sign language will become the time of 1836, and 
there is a case where the period of time becomes longer 
in comparison with the time of 1844 as in the case of the 
sign language animation. In that case, it is necessary to 

30 compress the actual image for the time of 1848 to the 
time of 1838. As mentioned in the above, when an 
actual image and the translation result is synchronized, 
there can be the case where there is no actual image to 
display or there is not enough time to display all. 

35 In the case of a sign language telephone device, 
when there is no actual image to display, the last still 
picture is used as a supplement As the actual image for 
the time of 1816, the last still picture for the time of 1814 
is displayed. The same thing can be applied to the time 

40 of 1820. 

In the case where all images were not possible to 
display, in a sign language telephone device, surplus 
images are simply abandoned or they are displayed in a 
quick operation. The time of 1838 is short in comparison 

45 with the time of 1848, so that in order to shorten the time 
of 1848. the end part is cut and the next sign language 
animation is displayed. The fact that the time of 1838 is 
shorter than the time of 1848 is caused by the fact that 
the time of 1836 is long in comparison with the time of 

50 1846. During the time of 1838. at the time when an 
actual image is started to be displayed, it can be found 
that how much the time of 1836 is longer than the time 
of 1846, so that it can be estimated how much the time 
of 1 836 has to be shortened in comparison with the time 

55 of 1 846. According to the estimation, to make the actual 
image for the time of 1848 be f inished within the time of 
1838. the actual image is displayed in the quick opera- 
tion mode. 
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One of the embodiments in which the skpi lan- 
guage animation which is a result of translation, and the 
actual image and voice are displayed being synchro- 
nized by the image, voice and character synchronizing 
unit is shown in Pig. 11. That is an embodiment of the 5 
process in the image, voice and character synchroniz- 
ing unit in the Japanese-to-sign-lang^g^transfalion- 
subsystem. At first it is investigated if there is a sign lan- 
guage animation to be output (802). If there is, the sign 
language animation and the image of one frame are oSs- 
played (810), and next, it is investigated if there is a sign 
language animation to be output Every time when one 
frame is displayed, the number of frames of the sign lan- 
guage animation to be displayed is decreased. When all 
sign language animations are displayed, it becomes 
impossible to display the sign language animation. If 
there is no sign language animation to be displayed, it is 
investigated if there is an actual image to be displayed 
(804). If there is an image to be displayed, one frame of 
the actual image is displayed (808). If there is not, the 
last image is displayed (806). When one frame is dis- 
played, corning back to the first, it is investigated again 
if there is a sign language animation to be displayed. 
The voice accompanied to an actual image correspond- 
ing to one frame of the actual image is output! ed every 
time when the actual image is ol splayed 1 frame by 1 
frame- 
About the embodiment of the process in the image, 
voice and character synchronizing unit 356 ri the sign- 
language-to^Japanese^ranslation-subsystem will be 
explained referring to Fig. 12. The voice translated from 
the sio/i language and synthesized in the voice synthe- 
sizing unit is managed by the frame which is the display 
unit of the actual image. It is investigated if there is a 
synthesized voice to be output (2002). If there is not, it 
is investigated if there is an actual image to be displayed 
(2004). If there is an actual image to be displayed, one 
frame of the actual image is displayed (2008). If there is 
not the last image is displayed (2006). If there is a syn- 
thesized voice to be output, the synthesized voice and 40 
the actual image for one frame are outputted (201 0) and 
the process is returned to the first It is investigated 
again if there is a synthesized voice to be output. The 
voice accompanied to the actual image is outputted by 
one frame of the actual image every time when the 45 
actual image is displayed frame by frame. 

The result in the voice recognition unit 412, as 
shown in Fig. 13, is also displayed as Japanese charac- 
ters (902) besides the sign language animation. These 
are displayed on the character display unit 1604 on the 50 
side of the sign language telephone device as shown in 
Rg. 8- 

Next, the process of the function of an answer 
phone in the sign language telephone device will be 
explained. There is a probability that the sign language 55 
telephone device can be called by a normal person and 
also an aurally handicapped person, so that in order to 
correspond to both cases, it is necessary that the 



response message of an answer phone is able to 
response in the sign language or characters other than 
voice. There is a probability a normal person besides an 
aurally handicapped person may register to the 
response message of a sign language telephone 
device. 

There is a probability that it is called by various 
kinds of people, so that some may not want to display 
an actual image. There are 3 modes in the response 
message as in the case of conversation. 

(1) Ordinary mode: an actual image is displayed. 

(2) Animation mode: an animation converted in the 
gesture to sign language animation conversion unit 
is used. 

(3) Suppression mode: an image is not rfsplayed 
on the main screen or an image registered before- 
hand is displayed 

Further when a normal person registers a 
response message to the sign language telephone 
device, a sign language mode shaft be added to the 
modes for aurally handicapped person described 
from (1) to (3) in the above. 

(4) Sign language mode: the translated sign lan- 
guage is displayed as an animation on the main 
screen. 

Before registration is started, an answer phone reg- 
istration mode is selected out of the above 4 modes by 
the mode switch 130. 

About the detail concerning the process in an 
answer phone control unit 366 in the sign-language-to- 
Japanese-translation-subsystem explanation will be 
given referring to Rg. 14. The screen of a response 
message in the ordinary mode is shown in Rg. 21. This 
picture is displayed in the videophone on the call origi- 
nating side. An actual image is displayed in 2102, a sign 
language animation in a response message is shown in 
2108. and a text is shown in 2104. At the same time, 
voice is outputted. The screen of a response message 
in the sign language mode is shown in Rg. 22. The 
screen is displayed in the videophone on the call origi- 
nating side. In the sign language mode, an actual image 
is not displayed, and an animation is displayed in 2202. 
At first, it is investigated whether it is the mode in which 
a response message of an answer phone is registered 
(1104). If it is not a registration mode, images or voice 
data generated in the image, voice and character syn- 
chronizing unit 356 in the sign-language-to-Japanese- 
translation-subsystem are transferred to the video- 
phone picture generation unit 360 (1 106). If it is a regis- 
tration mode, it is investigated if the response message 
is to be registered in the sign language (1 1 08). If it is the 
registration in the sign language, image or voice data 
generated in the image, voice and character synchro- 
nizing unit in the sign-language-to-Japanese-transla- 
tion-subsystem are stored in the answer phone data 
memory unit 378 (1 1 10). When a registration message 
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is generated, the display in the mode, described from 
(1) to (3), is made possible by selecting a screen mode 
to be displayed on a videophone on the side of a normal 
person, ff it is the registration in voice, the input from the 
input changeover unit 458 is changed over from the 5 
voice data obtained from the videophone voice data 
receiving unit 406 to the voice data obtained from the 
voice input unit 206 through a microphone (1 1 12). The 
image or voice data generated in the image, voice and 
character synchronizing unit 454 in the Japanese-to- w 
sign-language-translalk)n~ subsystem are stored in the 
answer phone data memory unit 378 (1 1 14). In the case 
of the sign language mode, the sign language which is 
translated to the sign language in the sign language ani- 
mation generation unit 438 is displayed as an animation 15 
on the actual image display unit 1702 in place of the 
actual image. 

The process of an answer phone control unit when 
there is a phone call to the telephone device in the 
answer phone mode wfll be explained referring to Fig. 20 
15. When there is a phone call in the answer phone 
mode, a response message is taken out from the 
answer phone data memory unit 378 (1202), and the 
message "rs transferred to the video phone picture gen- 
eration unit 360 (1204). If there is a message from the 25 
call originating side, the message is stored in the 
answer phone (1208). In a case of regeneration, when 
an aurally handicapped person is on the receiving side 
and a normal person is on the transmitting side, it is 
possible to read the message in translating it to the sign 30 
language in the Japanese to the sign language transla- 
tion subsystem. When a normal person is on the receiv- 
ing side and an aurally handicapped person is on the 
transmitting side, it is possfele to read the message in 
translating it to Japanese in the sign-language-to-Japa- 35 
nese-translation-subsystem. 

Next the explanation will be given about the 
received picture display mode on the screen in the sign 
language telephone device. The setting of the received 
picture display mode is performed with the mode switch 40 
130. There are the sign language standard mode, sign 
language enlargement mode, sign language animation 
mode and non-sign-lancpjage mode in the received pic- 
ture display modes. 

The sign language standard mode is used to dis- 45 
play the actual image display unit 1602 or sign language 
animation display unit 1606 in the allotment decided in 
the sign language telephone device as shown in Fig. 8. 

In the sign language enlargement mode, as shown 
in Fig. 16, the sign language animation display unit 50 
1606 in the sign language standard mode shown in Fig. 
8 is displayed in a larger pattern than that in the actual 
image display unit 1602, and they are respectively dis- 
played as a sign language animation display unit 1006 
and actual image display unit 1002. 55 

In the sign language animation mode, as shown in 
Fig. 17. the actual display unit 1602 and the sign lan- 
guage animation display unit 1606 in the sign language 



standard mode shown fen Fig. 8 are interchanged, and 
they are respectively displayed as an actual image dis- 
play unit 1906 and a sign language animation dteplay 
unit 1902. The character display unit 1904 is also dis- 
played being enlarged concomitant with the sign lan- 
guage animation display unit 1902. 

Both sign language enlargement mode and sign 
language animation mode make the sign language ani- 
mation easily be watched in displaying it in a larger pat- 
tern. The non-sign language mode is the mode in which 
the sign language is not used, and the device is the 
same as an ortfnary videophone. 

The display method of a received picture in the sign 
language telephone device wfll be explained referring to 
Fig. 18. It is investigated if the mode is the sign lan- 
guage enlargement mode (1402). If it is so, the sign lan- 
guage animation display unit is displayed being 
enlarged as shown in Fig. 16 (1404). If it is not, it is 
investigated if the mode is the sign language animation 
mode (1406). If it is so, the actual image display unit and 
sign language animation Display unit in the sign lan- 
guage standard mode are interchanged with each other, 
and displayed as shown in Fig. 17 (1408). In the sign 
language standard mode, the sign language animation 
is displayed in the form of the standard configuration as 
shown in Fig. 8 (1412). In the case of the non -sign-lan- 
guage mode, translation is not performed, and pictures 
are Displayed as an ordinary videophone (1414). 

Next the setting process of the voice in the voice 
synthesis will be explained referring to Fig. 19. At first, a 
desirable type of synthesized voice is selected (1 502). It 
is set so that a synthesized voice is outputted according 
to the selection in the above (1504). This is because the 
uneasy feeling of listeners of the synthesized voice will 
be decreased rf it is arranged to be able to select the 
quality of a synthesized voice, for example, the synthe- 
sized voice can be made a male voice or female voice 
according to the person's sex who uses the sign lan- 
guage or the synthesized voice can be a high tone voice 
for a young person or a low tone voice for an aged per- 
son. 

As a variation of the first ernbodiment following can 
be considered: it is made poss&le to communicate with 
a foreigner in translating the voice or character train of a 
videophone transmitted in a foreign language by provid- 
ing a foreign language translation unit in the sign lan- 
guage telephone control device 110 of the sign 
language telephone device. In that case, a foreign lan- 
guage translation program will be provided in the mag- 
netic disc 202 shown in Rg. 2, and it will be loaded in 
the memory 222, and the operation will be executed 
under the control of the CPU 21 0. Further, the execution 
of the following will be easy for the persons working on 
this line of business: translation of Japanese sign lan- 
guage to a foreign language and the transmission of it to 
a videophone on the side of a normal foreigner or a both 
way communication between a foreign sign language 
and Japanese. The foreign language translation unit 
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can be provided in a videophone control device other 
than the sign language telephone control device (either 
in the sign language telephone device or in the video- 
phone device). In the case of the translation of Japa- 
nese sign language to English. Japanese to English 
translation unit will be incorporated in the sign-lan- 
guage-to-Japanese-translation-unit 334 shown in Fig. 3 
and an English voice synthesizing function will be 
added to the voice synthesizing unit 350. In the case of 
the translation of EngSsh to Japanese sign language, an 
English voice recognizing function will be added to the 
voice recognition unit 412 shown in Fig. 4, and an Eng- 
lish to Japanese translation unit which converts an Eng- 
lish text to a Japanese text, the output of the voice 
recognition unit 412, wil be incorporated in the Japa- 
nese-tc^sign^anguage-translatjon-unft 416. 

Another variation of the first ernbodiment, makes a 
simple mode possfcle in which an ordinary telephone 
device (including PHS) for voice only can be connected, 
not a videophone, on the side of a normal person. In 
other words, when a system is so constituted that a sign 
language animation is generated from the voice sent 
from the telephone and displayed in the sign language 
telephone control device on the side of the sign lan- 
guage telephone device and the response is performed 
with only voice, even if there remains something to be 
studied as a both way real time communication, a long 
distance communication ( conversation) between an 
aurally handicapped person and a normal person can 
be realized in a simple mode 

Next a second ernbodiment will be explained. In 
the embodiment explained in the above, various kinds 
of functions were concentrated on the side of the sign 
language telephone device. In the second embodiment 
the functions in the sign language telephone device 
explained in the first embodimefTt are subjected to cen- 
tralized controls in providing a sign language interpreta- 
tion server 1300 on a wide area communication network 
such as BISDN or ATM as shown in Fig. 20. Thereby, 
the sign language recognition, generation of sign lan- 
guage animation, voice recognition, voice synthesis, 
etc. which are executed in the sign language telephone 
control device 1 10 are executed in the sign language 
interpretation server 1300. Owing to such configuration 
as mentioned in the above, it is made possible to offer a 
system, in which sign language telephone devices for 
respective persons on the side of aurally handicapped 
persons are not needed and it is only required to con- 
nect a special input device or the like for inputting hand 
gestures to an external input interface of an ordinary 
videophone device, and the recognition of the sign lan- 
guage, etc. can be performed in the sign language inter- 
pretation server; thereby the system may have a 
possibility to be widely used in the market- 
In the second embodiment the sign language tele- 
phone device comprises a TV set 1304, camera 1314, 
microphone 1322. speaker 1318 and a mode switch 
1346. and it is also provided with a special glove 1308 



such as a Cyber Glove (the trade mark of Virtual Tech- 
nologies) (of course, the use of a Data GSove is possi- 
ble). The main inputs of the sign language telephone 
control device 1306 are the input from a glove and the 

5 input of mode data setting from the mode switch 1346. 
The input data are transmitted to the sign language 
interpretation server 1300 through the videophone con- 
trol device 1324. The image data inputted from the cam- 
era 1314 are transmitted to the sign language 

10 interpretation server 1300 through the videophone con- 
trol device 1324. The camera 1314, speaker 1318 and 
the microphone 1322 are connected to the videophone 
control device 1324, which is the cSfferent point from the 
first embodiment It is because of the fact that the data 

is of images or voices are directly exchanged with the sign 
language interpretation server through the network. (In 
ihe first ernbodiment. these equipment are connected to 
the sign language telephone control device 110.) 

The sign language telephone device is connected 

20 to a wide area network 1306, and the system is so con- 
stituted that the sign language interpretation server 
1300 can be accessed by the sign language telephone 
device. The videophone device, in the same way as in 
the case of the first erntxxfi merrt, comprises a TV set 

25 1328, camera 1338, microphone 1334, speaker 1342 
and a videophone control device 1 330. This videophone 
device is also connected to the wide area network 1 306. 

The operation of the second embodimerit wifl be 
explained. Hand gesture data or image data sent from 

30 the sign language telephone control device 1324 are 
used for the translation from the sign language to Japa- 
nese, and in the sign language interpretation server 
1300. the sign language is translated to Japanese using 
the data, and an image shown Fig. 9 is generated as a 

35 display picture, and together with the generated synthe- 
sized voice, it is transferred to the videophone control 
device 1330. 

The voice sent from the videophone 1330 is recog- 
nized in the sign language interpretation server 1300 

40 and is made Japanese characters and translated to the 
sign language. As a display picture, a picture as shown 
in Fig. 8 is generated and it is transferred to videophone 
control device 1324 on the side of an aurally handi- 
capped person. 

45 As described in the above, according to the second 
embodiment the system is managed under the central- 
ized control of the sign language interpretation server 
1300; thereby, in utilizing the network environment pre- 
pared with videophone devices, communication 

so between a plurality of sign language telephone devices 
and videophone devices, or between sign language tel- 
ephone devices is made possible^ 

The technology of communications between vari- 
ous foreign languages and the sign language in provid- 

55 ing a foreign language translation unit or between the 
voice from an ordinary telephone and the sign language 
is also applicable to the second embodiment. 

In the first and the second embodiment, as a means 
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to input the sign language, a special glove such as Data 
Glove is used; however the present invention is applica- 
ble to the case where the sign language is inputted as 
images without using a Data Gove and it is recognized 
in the image recognition unit Besides the above-men- 
tioned examples, various kinds of combinations of the 
cases written in the ernbodiments or their variations are 
possible as the occasion may demand. 

According to the present invention, it is made possi- 
ble to offer a simple device with which an aurally handi- 
capped person who uses the sign language is able to 
converse with a normal person at a distant place who 
does not know the sign language. Long distance com- 
nrunication (conversation) in the sign language through 
an existing network is made possWe. 

Further, either the translation mode or the non- 
transiation mode can be selected, in the non-translation 
mode a hand move other than the sign language white 
a sign language telephone device is being used is not 
translated; thereby the degree of freedom in using the 
sign language telephone device is increased. The 
actual image of an aurally handicapped person is not 
output to a videophone device on the side of a normal 
person and the display on the videophone can be made 
with an sign language animation, which makes much 
help to the protection of privacy. Since the sign lan- 
guage telephone device on the side of an aurally hand- 
icapped person and the display on the videophone 
device on the side of a normal person are synchronized 
with each other, good communication can be enjoyed. 

A response message for a telephone cad which is 
made while no body is in is generated in combining the 
voice, image, character or sign language animation, 
which is effective for protecting privacy. The characters, 
which is a result of voice recognition of a normal person, 
are displayed on a videophone device on the side of a 
normal person together with a character train obtained 
in translating the recognition result of the sign language, 
so that a person on the side of a normal person is able 
to confirm if the contents of his talk is correctly transmit- 
ted to the sign language telephone device. 

Industrial Applicability 

As described m the above, the present invention, as 
the sign language telephone device which can be con- 
nected to a videophone or a telephone device by voice 
through a telephone line as a network (public network), 
is suitable to be used for the conversation between an 
aurally handicapped person who uses the sign lan- 
guage and a normal person at a distant place who does 
not know the sign language. 

Claims 

1. A sign language telephone device comprising a 
sign language telephone control means being con- 
nected to a videophone device through a network, 



in which said control means translates an input sign 
language, generates synthesized voices to transmit 
the same to said videophone, recognizes the voices 
from said videophone, and generates a sign Ian- 
5 guage animation from recognized results for dis- 
playing. 

2. A sign language telephone device as designated in 
claim 1, wherein said sign language telephone 
io device comprises: a TV set camera, microphone, 
speaker and sign language input means, and said 
sign language telephone control means comprises: 

a sign-langtjage-to-Japanese-transJaflk>r^ 
is system comprising: a sign language recogni- 

tion unit which recognizes the sign language 
inputted from said sign language input means 
and translates it to a sign language words train, 
a sign-lariguage-to^Japanese-transla^^ 
20 which translates the recognized sign language 

words train to Japanese, and a voice synthesiz- 
ing unit which outputs the translated Japanese 
as voices, and 

a Japanese-to-sign-language-translation-sub- 
25 system comprising a voice recognition unit 

which converts the voices from said video- 
phone device into a Japanese character train. 

a. A sign language telephone device as defined in 
30 claim 2, wherein said signJanguage-to-Japanese- 
translatkxi-subsystem comprises a sign-tanguage- 
tr>Japanese-translation-control-unit which controls 
the operation of said signHanguage-to-Japanese- 
translation-subsystem which comprises a transla- 
35 ton mode and a non-translation mode, and in the 
translation mode, the recognition result of the sign 
language recognition unit is controlled to be trans- 
mitted to the sign-lar»guage-to-Japanese-transla- 
tion-unit. and in the non-translation mode the 
40 recognition result of the sign language recognition 
unit is controlled not to be transmitted to the sign- 
language-to-Jarj^ese-translation-unit 

4. A sign language telephone device as defined in 
45 claim 3, wherein said sign language telephone 
device comprises at least a means out of the follow- 
ing means: a means to decide the mode with a but- 
ton, the translation mode or the non-translation 
mode, a means to decide the following: when the 
so face or body is facing forward, the adoption of the 
translation mode is decided and when the face or 
body in not facing forward, the adoption of a non : 
translation mode is decided, a means to decide the 
adoption of the non-translation mode when a hand 
55 is at the home position, a means to decide the 
adoption of the non-translation mode when the 
move of hands is stopped, and a means in which 
hand gestures to show the translation mode and 



15 



29 



EP0848552 A1 



30 



the non-translation mode are set beforehand and 
the adoption of the translation mode or the non- 
translation mode is decided with the hand gesture. 

5. A sign language telephone device as defined in s 
claim 2, wherein said sign-language-to-Japanese- 
translation-subsystem comprises a gesture to sign 
language animation conversion unit which gener- 
ates animations by using the positional data of 
hand gestures in the sign language inputted from io 
the said sign language input means as the data to 
move animations, and for generating animations. 

6. A sign language telephone device as defined in 
daim 5, wherein said sign-language-to-Japanese- 75 
translafon-subsystem comprises an actual image 
mode, an animation mode and a suppression 
mode, and also comprises an image generation 
unit for controlling the operation in said sign-Ian- 
guag^ttKlapariese-translation-SLteystm which 20 
inputs a sign language animation generated in the 
gesture to sign language animation conversion unit 
and an actual image from said camera, and it is so 
controlled by said image generation unit that in the 
case of the actual image mode, actual image data 25 
are displayed, in the case of the animation mode, a 
sign language animation is displayed and in the 
case of the suppression mode, pictures are not dis- 
played or a still picture registered beforehand is dis- 
played. 1 30 

7. A sign language telephone device as defined in 
claim 6, wherein said sign-language-to-Japanese- 
translation-subsystem comprises an answer phone 
control unit which generates a response message 35 
for a telephone call made in the answer phone 
mode using actual images inputted from said cam- 
era, sign language animations generated in the 
gesture to sign language animation conversion unrt. 

an image selected out of still pictures registered 40 
beforehand, Japanese character train obtained in 
translating the sign language to Japanese in the 
sign-language-to-Japanese-trarislatk>n-unit and 
voices generated in the voice synthesizing unit 

45 

8. A sign language telephone device as defined in 
claim 5, wherein said sign-language-to^Japanese- 
translation-subsystem comprises a first image, 
voice and character synchronizing unit which gen- 
erates images and voices to be transmitted to said so 
videophone device in composing images, a charac- 
ter train and voices, while synchronizing images 
generated in the image generation unit, a Japanese 
character train generated in the sign language 
translation unit and the voices generated in the 55 
voice synthesizing unit. 

9. A sign language telephone device as defined in 



claim 8, wherein said siojHanojjage-to-Japanese- 
translatfon-subsystem cornprises a videophone pic- 
ture generation unit which generates pictures to be 
sent to said videophone device in adrjng Japanese 
characters being the result of recognition in the 
voice recognition unit in said Japanese-to-sign-lan- 
guage-transtatfon-subsystem to an image gener- 
ated in the image, voice and character 
synchronizing unit. 

10. A sign language telephone device as defined In 
claim 2, wherein said sign language telephone 
device cornprises a means to display a Japanese 
character train, which is a result of translation of the 
sign language inputted from said sign language 
input means, in a part of the picture to be transmit- 
ted to said videophone device- 

11. A sign language telephone device as defined in 
claim 2, wherein said sign language telephone 
device comprises a means to display a Japanese 
character train, which is a result of translation of the 
sign language inputted from said sign language 
input means, and at least more than one animation 
among the animations generated from the sign lan- 
guage recognition result in the animation genera- 
tion unit in a part of the screen of said TV set. 

12. A sign language telephone device as defined in 
claim 2, wherein said Japanese-to-sigrHanguage- 
translation-subsystem cornprises Japanese-to- 
sign-language translation-unit which translates 
Japanese recognized in the voice recognition unit 
to the sign language and the sign language anima- 
tion generation unit which displays the translated 
sign language as a sign language animation. 

13. A sign language telephone device as defined in 
daim 12, said Japanese-to-sign-language-transla- 
tion-subsystem cornprises a second image, voice 
and character synchronizing unit which composes 
a character train, sign language animations and 
images in synchronizing a character train, a result 
of recognition in the voice recognition unit sign lan- 
guage animations generated in the sign language 
animation generation unit and actual images trans- 
mitted from said videophone device. 

14. A sign language telephone device as defined in 
daim 12, wherein said sign language telephone 
device comprises a means which translates the 
result recognized in the voice recognition unit to the 
sign language, and displays the translated sign lan- 
guage in a part of said TV set screen as a sign lan- 
guage animation generated in the sign language 
animation generation unit. 

15. A sign language telephone device as defined in 
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claim 14, wherein said sign language telephone 
device comprises at least 1 means out of 2 means 
shown below: one means is a display means to dis- 
play 2 display areas as shown below on said TV 
screen by turns: one is an image display area to 5 
display actual images transmitted from said video- 
phone and another is a display area to display the 
sign language animations generated in the sign lan- 
guage animation generation unit in recognizing the 
voices transmitted from said videophone and trans- w 
lating them to animations; and another means is an 
enlargement means to enlarge the oSsplay area tor 
the sign language animations generated in the sign 
language animation generation unit in recognizing 
the voices transmitted from said videophone and 15 
translating them to the sign language. 

16. A sign language telephone device as defined in 
claim 2, wherein said sign-language-fc>Japanese- 
transiation-subsystem comprises a means to be 20 
able to select voice quality or adjustment of voice 
synthesis. 

17. A sign language telephone device as defined in 
claim 1. wherein said sign language telephone 25 
device comprises a language translation means 
which allows to perform communications between 
the sign language in a first language and the voice 

in a second language. 

30 

18. A sign language telephone device connected to a 
telephone device through a network comprising a 
means which translates the input sign language to 
generate synthesized voices and transmits them to 
said telephone device, and recognizes the voice 35 
from said telephone device and generates and dis- 
plays at least one, either a sign language animation 

or a character train. 

19. A sign language telephone network system com- 40 
prising a sign language interpretation server and 
being connected to a network wherein said sign 
language interpretation server translates sign lan- 
guages inputted from a first videophone, included in 
the configuration of a sign language telephone 45 
device, to generate synthesized voices, and trans- 
mits them to a second videophone, and also said 
sign language interpretation server recognizes the 
voices from the second videophone and generates 
sign language animations, and makes them dis- so 
played in the second videophone device. 
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